class: center, middle, inverse, title-slide # Lecture 2 ## New variables and Plots ### Psych 10 C ### University of California, Irvine ### 03/30/2022 --- ## Recognition Memory - The data that we will work on today comes from a recognition memory experiment -- - In the experiment we asked participants to look at a list of 50 words that would be used on a latter test. -- - After 5 minutes have passed we showed each participant 100 words one at a time and asked them if the word on the screen was on the original list or not. -- - Then, after one hour we did another test, again presenting participants one word at a time and asking them if the word was on the original list. -- - We assigned each participant an ID number for the experiment and recorded their age and the number of words that they correctly recognized as being part of the original list of 50 words on each test. -- - We organized all these data in R and save it as a `.csv` that we want to look at now. -- - How should we start? --- ## Load data into R - We will keep working with the memory data from last class
--- ## Creating new variables - Sometimes we want to work with some transformation of the variables that we have on a data file. -- - For example, if we're interested in knowing how many of the words that were correctly recalled came from the first or second test. We can make a new variable, based on the old ones, that contains this information. -- - This can make some plots easier to make! --- ## Creating a new variable - We will create a new variable that generates a label "test-1" whenever the test time (time_test) was 300 seconds after the study phase, and "test-2" when the test time was 3600 seconds after: -- ```r # create new variable using pipes! %>% this takes the output # of the preceding line and uses it as input on the next one memory <- memory %>% mutate("test_id" = ifelse(test = time_test == 300, yes = "test_1", no = "test_2")) # look at the first 4 rows of the data head(x = memory, n = 4) ``` ``` # A tibble: 4 × 5 id age correct time_test test_id <dbl> <dbl> <dbl> <dbl> <chr> 1 1 20 46 300 test_1 2 2 29 49 300 test_1 3 3 29 48 300 test_1 4 4 25 44 300 test_1 ``` --- ## Creating a new variable - The **`mutate()`** function allows us to create new variables from the ones already available in the dataset. **`ifelse()`** is another function applied here to impose a condition to be considered when creating this new variable. -- - How can we modify the code above to create a new variable that labels participants as "young" if they are older than 35 and "elderly" otherwise? - You have 3 minutes. -- - **ANS:** .can-edit.key-likes[ - ] --- class: inverse, center, middle # Plotting ## Histograms --- ## Histograms - A histogram represents a count of the number of times that a value has appeared in our data. -- - They are constructed by creating intervals and counting the number of data points that fall on each. -- - We can look at an example using our **`age`** variable. -- .pull-left[ <img src="data:image/png;base64,#lec-2_files/figure-html/hist-age-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#lec-2_files/figure-html/hist-age2-1.png" style="display: block; margin: auto;" /> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + * aes(x = correct) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + * aes(fill = test_id, color = test_id) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + * geom_histogram(position="identity", * binwidth = 1, * alpha = 0.4) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + * theme_classic() ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + * xlab("Number of correct recalls") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + * ylab("Frequency") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + ylab("Frequency") + * guides(fill = guide_legend("Test order"), color = "none") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + ylab("Frequency") + guides(fill = guide_legend("Test order"), color = "none") + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-hist-code-auto[ <!-- --> ] <style> .panel1-hist-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-hist-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-hist-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## Histograms - One of the main problems with histograms is that their shape depends on our choice of the width of the bars! -- - A change on the shape can change our interpretation of the results so we need to be careful when making our choice. -- - Histograms can be used for numeric variables. --- class: inverse, center, middle # Plotting ## Box-plots --- ## Box-plots .pull-left[ 1. Box: has 3 marks, the limits which represent the first and third quantile and the median or second quantile. (We will define quantiles on the next lecture) 1. Whiskers: represent the maximum (minimum) of our observations that are lower (greater) than 1.5 times the distance between the first and third quantile. 1. Everything outside of those marks is considered as an outlier. ] .pull-right[ <img src="data:image/png;base64,#lec-2_files/figure-html/ex-bp-1.png" style="display: block; margin: auto;" /> ] -- - We can use the same data as before for an example --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + * aes(y = correct) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + * aes(x = test_id) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + * aes(color = test_id) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + * scale_color_brewer(palette="Dark2") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + * geom_boxplot(fill = "white") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + * xlab("Test order") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + * ylab("Number of correct recalls") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + ylab("Number of correct recalls") + * guides(fill = "none", color = "none") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + ylab("Number of correct recalls") + guides(fill = "none", color = "none") + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-bp-code-auto[ <!-- --> ] <style> .panel1-bp-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-bp-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-bp-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## Box plots - Box plots show us how our data is dispersed, for example the number of correctly recalled words are closer together in the first test in comparison to the second. -- - We can also see that the median number of correctly recalled words was higher on test one. -- - This plot also allows us to evaluate if our data is dispersed symmetrically around the median value, or if there's some bias towards one of the ends. -- - There are some variables that we would not expect to be symmetric (think about reaction times in a game). --- class: inverse, center, middle # Plotting ## Scatter plots --- ## Scatter plots - Histograms are useful when we have a single numeric variable. -- - Box plots are very informative of the variability in our data. -- - Scatter plots are useful when we want to see how two numerical variables "change" together. --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + * aes(y = correct) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + * aes(x = age) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + * aes(color = test_id) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + * geom_point(fill = "white") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + * xlab("Age") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + * ylab("Number of correct recalls") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + * guides(fill = "none", color = guide_legend("Test order")) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + guides(fill = "none", color = guide_legend("Test order")) + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + guides(fill = "none", color = guide_legend("Test order")) + theme(axis.title.x = element_text(size = 20), axis.title.y = element_text(size = 20)) + * geom_smooth(method = lm) ``` ] .panel2-scatter-code-auto[ <!-- --> ] <style> .panel1-scatter-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style>